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The elicited probability approach is based on a transition matrix which 
relates the current state vector to a set of state transformation operators. 

The matrix elements are conditional probabilities elicited from experts (or can. 
be determined by collecting statistics). The state transformation operators are 
rules which dynamically change the state of the simulation when selected by the 
application of Bayesian algorithms. The basic mechanism can be used to select 
operators in a hierarchic manner by incorporating them in higher level trans- 
formation operators. 

The adaptive decision appraoch uses pattern recognition to learn opponent 
behavior from instructor opponent controllers (operator). This approach is base¢ 
on decision modeling and utililty theory. The adaptive learning algorithms act 
as a pattern classifier and is used to identify biases in operator decision 
policy as a response to classes or patterns in the input data. The Multi- 
attribute Utility (MAU) model is used to capture the decision behavior of the 
operator. In the MAU model, the consequences of every action are considered 
to be decomposable according to a single common set of attributes. 

The heuristic search approach provides a mechanism by which the opponent 

jresponds to actions taken by friendly forces with a course of action which leads| 
to the achievement of some enemy goal. A state space model is used to 
represent the problem domain. The states are a complete description of the 
tactical situations as they exist at a particular instant of time. An action 
converts one state into another. The opponent asks the question, “What sequence 
of actions can transform the current state into a desired goal state?" The 
basic search algorithm begins at a start node and expands successive nodes 

Juntil a goal node is encountered. Then the path from the initial node to that 
goal node is the solution sought. Heuristic search algorithms use domain 
Specific knowledge to guide the search. Heuristic knowledge may apply to node 
expansion or to path evaluation. In either case heuristic knowledge is used 

|to reduce the searching effort. Specific heuristic search algorithms are 
|discussed. : 

The production rules approach uses sets of situation-action pairs, called 
/"productions" to transform the current state to the next state. The 
productions represent the problem specific knowledge. In addition to productions, 
the production rule system contains a triggering mechanism that applies those 
that are applicable-causing the situation to change. AND/OR graphs represent 
one kind of production rule system. Production rule systems resemble the 
human reasoning process, and can be used to answer the questions of how or why 
a particular conclusion was reached by the system. Also, the user can 
hypothesize a conclusion or desired final state and use the productions to 
work backward toward an enumeration of the facts that would support the 
hypothesis. 
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SECTION 1 
INTRODUCTION 7 
GENERAL 


This report gives an overview of four types of models for simulating an 
intelligent opponent for enemy submarine maneuvers within the Navy Submarine 
Combat System Trainers (SCST). These models were selected for discussion 
because they are most promising. With variations of each model, the 
possibility of combining models, and the use of sub-models, many combinations 
are possible. 


The specific objectives of the program are: 


a. Analysis of current requirements of Navy submarine tactical trainers 
with respect to the tactical behavior of simulated enemy submarines. 


b. Identification of potential knowledgeable opponent mode | algorithms 
and techniques applicable to submarine tactics. | 


c. Evaluation of each potential model in terms of tactical maneuver 
7 capabilities, model trainability, software requirements, trainee. 
performance measurement, and required research and development. 


REQUIREMENTS ANALYSIS | 

The requirements for knowledgeable opponent models were given in an 
earlier report (Leal, Purcell], Thomas, 1978). Figure 1-1 is reproduced from 
this r-port. The flowchart shows a decision diagram for submarine commander. 


The following iS a summary of the decisions that must be made by the 
knowledgeable opponent mode]: 


a. Overall Mission 
Transit | 
Patrol (barrier, broad, choke point shipping or sub-routes) 
Type of sub (fixed decision) 
b. Has friend been detected. 
c. Has friend detected enemy (legitimate opponent). 
_d. Evade or not (or hide in deep water). 


e. Tactic selection (course, depth, heading, maneuvers, etc.) 


f. Strategy selection 


. 
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Subgoals. 


What sensors to use (active or passive). 


What weapons to use if at all, realtack. 


j}. Use decoy or not. 


False alarm? 
Move within range, close to investigate. 
Track friend. 


Clear its baffles, turn and look behind it. 
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REQUIREMENTS ANALYSIS (continued) 


Since the opponent and friend have essentially the same decision 
Structure, the same model which is developed for the opponent can also model 
the friend. This brings up a number of interesting and useful possibilities: 


a. Play one model against the other. By doing this, it will be easier 
to debug the software. Also, it is possible to develop a set of 
performance baselines which can be used for further model development — 
and to develop evaluation guidelines. 


b. The opponent model easily contains a model of the friend. Further 
levels of recursion are possible. For example, the friend can be 
aided by an opponent model which contains a friend model. 


c. Different models can play each other to evaluate which model is best. 

do Different parameter values can be set for each model and the models 
can play each other in order to evaluate the effectiveness of various 
Strategies and various assumptions regarding opponent capabilities. 


It should be emphasized that when the same model is used for several 
purposes, different behavior can be created by varying model parameters. | 


POTENTIAL MODELS 

From an analysis of the requirements of the knowledgeable opponent mode} 
and from an analysis of existing simulation and modeling techniques, four major 
approaches have been identified which show potential for model implementation. 
These approaches are: 

_a. Elicited probability approach 

b. Adaptive decision modeling approach 

c. Heuristic search approach 

d. Production rules approach 

The remainder of this report discribes each approach and shows its 
applicability to the ASW problem. The next phase of the program will establish 


Specific design and implementation relationships between each approach and the 
Simulation requirements, in addition to the evaluation of each model. 
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SECTION II 
THE ELICITED PROBABILITY APPROACH 


INTRODUCTION 


The elicited probability approach to scenario generation is logically 
Similar to Bayesian information processing. However, instead of aggregating 
expert opinions to estimate the probabilities of complex events in a real 
world, we use those probabilities to simulate the real world. The scenario’ 
can represent a knowledgeable opponent as well] as the environmental conditions. 


The technique is based on a transition matrix which relates the current 
state vector to a set of state transformation operators. The components of 
the transition matrix are the conditional probabilities of each state 
transformation operator (or rule which dynamically changes the environment), 
Given the value of each state vector variable. These conditional probabilities 
can be estimated by experts on the behavior of the environment being simulated, 
or statistics can be collected to determine them. 


The next step is to compute the conditional probabilities of each state 
transformation operator, given the current state vector. The actual state 

- transformation operators applied to the current state vector are chosen on 

the basis of these probabilities by means of a Monte Carlo selection 

procedure. Alternatively, the transformation operator with the highest 

probability could be selected. The state transformation operators are then 

executed to obtain a new state vector. 


The basic mechanism can be used to select independently n_ state 
transformation operators, one from each of n_ sets of operators, to be 
applied in parallel. The basic mechanism can also be used to select operators 
ina hierarchic (dependent) manner "by incorporating them in higher level 
transformation operators. 


THE BAYESIAN MODEL 


The Bayesian model provides a mechanism for transforming a state space 
representation of the environment at discrete times. Figure 2-1 schematically 
represents the model. The mechanism is similar to that used to aggregate 
expert opinion in Bayesian information processing systems. The current state 
vector is used to select probabilities from a conditional probability matrix 
relating a set of state transformation operators to the current state vector. 
These probabilities are aggregated using Bayes' theorem to obtain a 
transformation probability vector. This vector contains the probability 
that, given the current state vector, each transformation operator in the 
set will be selected. Operators are selected from the set using a Monte Carlo 
selection procedure based on these probabilities. Finally, the selected 
operator is invoked to transform the current state vector to the next state. 


In most situations, a given operator wil: transform only a subset of the 
State vector. Therefore, it will be necessary to invoke a number of 
transformation operators "simultaneously." These operators are selected by 
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repeating the above process, using different sets of operators and 
correspondingly different conditional probability matrices. 


THE STATE VECTOR 


The state of the scenario generator (and, thus, the scenario) at time 
t is represented by a state vector Zt. This vector is comprised of discrete 
State variables: | 


Zt = oe ae or Zs | 27 


The values of all state variables define the state of the scenario at any 
time, t. 7 


THE CONDITIONAL PROBABILITY MATRIX 


The conditional probability matrix (P-Matrix) relates the current state 
vector, Z*, to a set of state transformation operators. The components of the 
matrix are the conditional probabilities that, given the occurrence of the 
State transformation operator T., the ith state variable had a value z-: 
p{zi|T ). These probabilities are estimated by experts on the behavior of the 
env{rofiment being simulated. The expert estimates can be based upon experience, 
upon real world measurements, upon theoretical models, etc. It is also 
pessible to determine the conditiond!] probabilities by collecting statistics 
during an actual training session in which the instructors are controlling 
opponent actions. 


Two vectors of a_ priori probabilities, also estimated by experts, are 
required. The components of the first vector, P+, are the a priori 
probabilities that each state transformation operator will be selected. 
They are represented thusly: 


P, = [p(T,), p(To), «--» ptt) (2-2) 
The components of the second vector, P_, are the a priori probabilities of the 


occurrence of each state component of ¢he Z vector. They are represented 
as follows: | 


P, = [p(z,). p(zo). .--» p(z,)] (2-3) 
The a priori probabilities don't have to be estimated with great precision 
because, as the scenario unfolds, they have less and less effect over the 
behavior of the scenario. 
PROBABILITY AGGREGATION 


The probability that a state transformation will be selected is computed 
by aggregating the conditional probabilities according to Bayes' rule: 


10 
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ya zy (2-4) 


If we assume that the variables which comprise the state vector are 
independent, then 


j (2-5) 
7 =] 
and 
ee: 
o(ztlT,) = Tj (2575) (2-6) 
j=] 
Thus, equation 2-4 becomes 
-¢, _ P(T;) 1 plz 2; IT, ) 
(T |Z ) = j | 

n p(ze) (2-7) 


When equation 2-7 is implemented, che p(T. |Z") are normalized; thus, the 
denominator in equation 2-7 is not needed: 


The assumption that the variables which comprise the state vector are 
independent is a crucial one. The most practical way to meet this condition 
is to take care to define the state vector such that it is independent. If 
there are dependencies in the state vector, they may not noticeably affect 
the behavior of the scenario (e.g., environment, opponent's actions). This 
can be tested by using the model to generate behavior which is viewed by the 
person from whom the probabilities were elicited. If the behavior is not as 
desired, the elicited probability values can be fine-tuned until the proper 
behavior is obtained. 


One technique of handling dependencies in the state vector is to also 
elicit the covariance matrix representing the correlation among state variables. 
This matrix can then be used in one of two methods: 


a. The problem is transformed into a domain where 
independence holds. 


b. The covariance matrices are used to derive weights 
to compensate for dependence. 


Both methods have several disadvantages: 


1] 
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a. The covariance matrices are dependent on the order of 
processing state variables; a different covariance 
matrix must be used for each order. 


b. The covariance matrices involve either asking people 


to estimate means and standard deviations, or polling 
-a group of experts and collecting these statistics. 


c. When the probabilities are subjectively determined 
(by elicitation), the precision of the problem is 
such that the covariance matrices may be meaningless. 


In general, the complexity of using the covariance matrices seems to exceed 
that justified by meaning and relevance. 


Another method of handling dependencies in the state vector is to construct 
a new set of variables based on permutations of some of the dependent variables. 
This approach is simple, but leads to a rapid increase in the size of the 
State space. 


STATE TRANSFORMATIONS 


There are virtually no restrictions on the kinds of state transformation 
operators which can be defined. A transformation operator may affect a 
Single state variable and generate a constant output. It may also affect a 
large number of state variables and make use of a complex decision stragegy 
to determine their values. The transformation operator may even determine 
the value of a variable for several subsequent time cycles. 


A transformation operator may make use of subsets of Zt which were not. 
used in selecting the operator. An operator may also make internal use of 
Bayesian aggregations based upon additional conditional probability matrices 
and subsets of Z°. Thus, hierarchies of transformatin operators can be 
established. | | 


Each transformation operator affects a set of one or more State variables. 
The operators, in turn, are grouped according to which set of variables they 
affect. These sets of variables must be disjoint because, after a single 
operator is selected from each set, the selected operators are assumed to be 
invoked simultaneously. If the sets of variables are not disjoint, the 
order in which the selected operators are actually invoked will affect the 
value of the transformed state vector. However, non-disjoint sets of variables 
can be handjed by establishing a hierarchy of operators within a "higher level" 
operator. 


The selection of one state transformation operator from each operator 
set is made by means of a Monte Carlo selection procedure. The probabilities 
of occurrence of each operator in the set are normalized to obtain a discrete 
cumulative distribution function. A uniformly distributed pseudorandom 
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number in the range [0,1] is then generated and its position in the 
distribution function is used to select the operator. Alternatively, the 
operator with the highest probability could be selected. 


In some experimental applications, it may be useful or necessary to 

sot ie gc ip that a state variable will have a particular value. 

1|7t). By restricting the kinds of allowable state transformation 
it OS al to those that generate a constant (and unque) result, it is 
possible to obtain these probabilities directly from the scenario generator. 
If state transformation operator, Tj, outputs the same value for Et 
whenever it is invoked, and only 1; outputs that vaue, then 

+}. — a 
(2,1 2") = p(T, |Z4) (2-8) 


If more complex transformation operators are used, p(z (agg As \} becomes more 
difficult to compute. A value can always be obtained, however, by making 
Statistical measurements of the behavior of the scenario generator. 


The current state vector, vi is transformed into vag } by the (assumed) 
Simultaneous invocation of all of the selected state transformation | 
operators. If the state vector is properly designed, it is possible to 
use the Bayesian/Monte Carlo selection mechanism to choose all of these 
operators. However, in many instances it may be more convenient to use 
"external" mechanisms to select transformation operators for certain subsets 
of the state vector. These oe) controlled state vector subsets will 
be collectively referred to as the E* subvector (see Figure 2-1). Examples 
of externally controlled state variables would include clock driven variables 
Such as day and night. high and low tides, and events which occur on a fixed 
schedule. 


PROBABILITY ELICITATION 


Previous research has shown that human experts are good at estimating 
conditional probabilities, but poor at aggregating them (e.g., Edwards, 1962). 
Accordingly, the present scenario generator uses conditional probabilities 
elicited from experts and aggregates them automatically. First, expert 
inputs are used to: | 


a. Describe the environment to be modeled in terms of 
relevant state variables. 


b. Determine which variables are externally controlled 
and which are controlled by the Bayesian model. 


c. Define all of the transformations which change the 
state variables. 
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Then, the expert is queried in detail to: | 
d. Estimate the a priori probabilities and the individual 
conditional probability which constitute the entire 
matrix. 


The method of elicitation is simply to interview the expert and ask him the 
probabilities. Bond and Rigney (1966) were able to elicit almost 650 
conditional probabilities associated with electronic trouble shooting in one 
hour using a simple questionnaire. | 


‘The process of probability elicitation is an iterative one which allows 
the expert to refine his estimates. That is, once the initial estimates are 
made, test scenarios are generated which allow the expert to see the 
consequences of his estimates. ke is then asked to modify his estimates to 
make them more consistent with the desired behavior of the scenario generator. 
ELICITED PROBABILITY APPROACH 

_ Advantages of Strong Points 
a. Simplicity; easy to develop, maintain, implement. 
b. Generates a probabilistic opponent and environment. 


c. Weights representing behavior are easy to elicit and 
to alter. 


d. State oriented; easy to switch between manual and 
automatic operation. 


Disadvantages or Weak Points 


a. It is difficult to alter structural aspects due to the 
need to avoid dependencies in the state vector. 


b. Difficult to insert logical statements to control 
the scenario. 


c. The application of state transformation operators 
may be order dependent. 


14 
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Examples of Elicited Probabilities. The following_is a table of 
a priori and conditional probabilities of state elements given detection 


or non-detection. Note: Conditional probabilities are used as weights. 
Opponent's decision: Have I been detected by the home submarine? 


A Priori 
Have Been Have Not 
State Elements Prob OF | Detected | Been Detected 
State 
: 75 Aas) 


Home gone active - Yes 


Opponent detects. 
Signals _ - No 


Has opponent 
detected home sub - Yes 


- No 
Proximity of home ~- Very 


submarine Close 


Environment noisy 


Therma] Tayers 


A priori probability 
of detection 7 10° . 90 
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Decision: What maneuver should I select? 


| 7 | Maneuver 


: A Priori] Hide | run zigzag proceed attack 
eee Prob | 1 | 3 ee: 
How far is home: Undetected . 60 0 etc. 
Very Close 150 | 225 
: Near ms . 30 
| Mid-range 05 cae 
| | Far | 05 | .20 | 
| Has opponent decided - Yes S155. pf tee | etc. 
| that home has 
iecected him? 
| - No £05 
War or peace? War si etc. etc. 
| Peace 99 
Water depth: Shallow 
| Normal 
Deep 
Very Deep | 
Noise: Yes | 50 etc. etc. 


How long on current 
Maneuver? short 
Middle | 
End 
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AUXILIARY SYSTEMS : 

The above discussion covers the part of the elicited probability 
approach to generating scenarios. In the system developed by Perceptronics, 
a sensor system and an Intelligence Analysis System (IAS) were also 
developed. 


The sensor system, shown in Figure 2-2 would enable the opponent model 
to deploy sensors and get sensor information. In the present application, 
however, it doesn't seem necessary to go into that level of detail. 


The function of the Intelligence Analysis System is to simulate an 
intelligence expert who knows how the environment and the opponent behave, 
but must rely upon status reports from the ASW trainee for data about the 
current status of the fleet. The IAS provides the trainee with an 
intelligence report (i.e., the probabilities of possible actions by the 
opponent based upon the information in the trainee's status report). 


The IAS is identical to the scenario generator, with two vital exceptions. 
First, it uses a "current status state vector," Zt, which is generated from. 
the trainee's status report, instead of the vector which represents the actual 
State of the environment, 7 This means that the IAS intelligence report 
will be only as accurate as the trainee's status report. Second, the. 
intelligence report is based upon the aggregated probabilities. Thus, no 
Monte Carlo selection is made and no transformation operators are invoked. 

By limiting the kinds of transformation operators used in the scenario 
generator to those which output a constant output, we insure that the 
aggregated probabilities of state transformations are the same as the 
probabilities of the generated states (see Section 2.6). A functional 
description of the Intelligence Analysis System is provided in Figure 2-3. 


DECOY MODEL 


The Basic Concept. A decoy is a relatively cheap target which simulates 
the target of values to the detectors. Decoys are characterized by cost and 
their ability to simulate the target of vaue to various types of detectors. 

A convenient number to represent this is the probability of fooling the 
detector. The opponent would only deploy decoys if he knew he was being 
hunted because otherwise he would increase the chance of detection. 


The Representation. 


Detector; ESM | SONAR Paras aed 


[cheep 
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Figure 2-2. Fiowchart of Sensor System 
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Decoy Deployment Decisions. The opponent must decide what if any 
decoy to deploy. Once it is decided to deploy a decoy it must be decided 


what behavior to program for the decoy; this includes: 
e trajectory for decoy to follow; 


e ‘counter measures for the decoy to use. 
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SECTION III 
THE ADAPTIVE DECISION MODELING APPROACH 
INTRODUCTION 


The adaptive decision approach to generating knowledgeable opponent 
behavior -- which uses pattern recognition -- is based on learning opponent 
behavior from instructor opponent controllers. This approach is based on 
decision modeling and utility theory. In the present application, all of 
the relevant information for selecting the opponent's next action is 
immediately available at the time it's needed. 


ADAPTIVE DECISION MODELING 


Work on adaptive decision-making is derived from the areas of behavioral 
decision research and AI experience with learning networks. The unique 
aspect of this approach 1s the capability to adjust model parameters on-line 
and change decision strategy accordingly. In essence, the learning system 
attempts to identify the decision process of the human operator on-line by 
(a) successive observation of his actions, and (b) establishment of an interim 
relationship between the input data set and tne output decision (the model). 
Learning in this context refers to.a training process for adjusting model 
parameters according to a criteria function. The object is to improve model 
performance as a function of experience, or to match the model characteristics 
to that of the operator. 


Learning techniques have been used to model the decision strategy and to 
identify the sources of cognitive constraints on the human operator performing 
a dynamic prediction task (Rouse, 1972). Another example of an adaptive 
model of the human operator through real time parameter tracing has been 
reported by Gilstad and Fu (1970). Linear and piecewise-linear discriminant 
functions were used to classify system gains, errors and error rate. The 
decision boundaries for classification were determined through a process of 
on-line learning, observing operator performance and parameter adjustment. The 
specific model used was applicable only to very limited tasks, and merely 
illustrated the feasibility of the technique. 


A unique advantage of using a learning system lies in its capability to 
act aS a pattern classification mechanism. As such, it can be used to identify 
biases in operator decision policy as a response to classes or patterns in the 
input data (Tversky, et al, 1972). In conventional Bayesian technique, the 
pattern of events is decomposed into elementary data points. With the 
assumption of independence, the elementary data points are aggrecated to revise 
the hypothesis. Effects of the data pattern do not bear on the decision. 


In dynamic decision making, however, the temporal and spatial nature of 
_ the data are highly significant. Since decision data appear as a pattern of 
individual events, it 1s reasonable to assume that the subject responds to the 
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pattern as well as to the individual] values. In fact, the pattern may contain 
the greater amount of information. Classification of input patterns by the 
learning mechanism can be accomplished by programmed congnizance of such data 
features as: data with non-independent events, data with correlated events, 
data with events which continuously vary with time, the number of elements 

of decision data and the rate of change in the data points. 


THE MAU MODEL 


Multi-attribute decision analysis is the most widely used approach for 
making evaluations involving multiple criteria. MAU methods decompose the 
complex overall evaluation problem into more manageable sub-problems of 
scaling, weighting, and combining criteria. In doing so, the MAU methods 
provide a rich framework for analysis, discussion, and feedback. This "divide 
and conquer" approach to evaluation involves defining the problem, identifying 
relevant dimensions of value, scaling and weighting the dimensions, and finally 
aggregating the dimensions into a single figure of merit for the system. 


The power of the multi-attribute approach lies in its level of analysis 
and flexibility. Sensitivity analyses of the level and weight of each 
_ dimension can provide indications of what aspects to concentrate tests on, or 
what system elements to modify. Flexibility is present, since criteria can 
be added or deleted as necessary. Also, the weights and levels can be quickly 
adjusted according to new functional -requirements and capabilities. 


In the MAU model, the consequences of every action are considered to be 
decomposable according to a single common set of attributes. The model computes 
an aggregate multi-attribute utility (MAU) as a weighted sum of each consequence 
attribute level (A.) multiplied by the importance or utility of the attribute 
(W.). The calculated MAU of each action is used as the selection criterion: | 


MAU, = We Ay 
where 
MAU. = the aggregate utility of option j 
W. = the importance weight of attribute i, and 
A., =. the level of attribute i for action k. 


Figure 3-1 shows the major components of the MAU model in block diagram 
form. Possible actions are parameterized in terms of attribute levels. The 
MAU calculator uses as inputs (1) the attribute levels of the given action, and 
(2) a vector of “attribute weights" which have been dynamically estimated for 
a given operator by an adaptive model. 
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Calculation of the multi-attribute utility for each action is central to 
the. operation of the model. The MAU calculation jis shown jn Figure 3-2. The 
dot-product of the attribute level vector and the attribute weight vector 
provides the aggregate MAU value. The attributes are scaled so that each 
attribute level ranges from 0 to 1. Further, the orientation is arranged such 
that each attribute contributes positively to the overall aggregate MAU. That 
1S% holding all other attribute levels constant, an increase in any attribute 
level increases the MAU. 


ATTRIBUTE CHOICE. 


The determination of attributes to include in the decision model is 
probably of greater importance than the accurate assessment of the importance 
weights (Dawes, 1975). The following list of desirable characteristics for 
the attributes expands on Raiffa's (1969) recommendations of attribute 
independence, set completeness, and minimum dimensionality: 


a. Accessible. The levels of each factor should be easily and accurately 
measurable. : 


-b. Conditionally Monotonic. The factor level should be monotonic with 
: the criterion (preference) regardless of the constant values of other 
factors. 7 7 | | 


c. Value Independent. The level of one attribute should not depend on 
the levels of the other attributes. This is to some extent a 
consequence of recommendation b. 


d. Complete. The set of attributes should present the operator's 
behavior as completely as possible. 


e. Meaningful. The attributes should be reliable and should demonstrate 
construct validity. Feedback based on the model attributes should be 
understandable to the operator. 


For the most part, these recommendations result in an attribute set that 
is measurable, predictive, and in accord with the axioms of utility theory. 
The recommendations also imply a limitation on the number of possible attributes. 
.The requirements of independence and meaningfulness render any large set of 
attributes unrealizable, because of the cognitive limitations of the human 
operator. 


ADVANTAGES OF THE MULTI-ATTRIBUTE UTILITY MODEL . 


The multi-attribute information utility model presented here is 
characterized by several attractive features. These features, itemized below, 
offer substantial advantage over the EU cecision mode]. The advantages arise 
out of the theoretical structure of the model, especially its decomposition 
property, and have all been empirically demonstrated to some degree in ongoing 
Perceptronics programs, (Samet, Weltman, and Davis, 1976; Steeb, Chen and 
Freedy, 1977). 
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Generality. The adaptive, multi-attribute model for information 
selection holds a considerable amount of generality. It can be 
applied in situations where diagnostic actions can be decomposed 
into a small set of manageable, quantifiable attributes which have 
two critical characteristics. First, they must be logically related 
to the situation-specific demands. That is, their relevance to 


specific situations must be known. Second, they must directly 


jmpact upon a decision maker's choices among competing options. A. 
number of military decision-making environments have already been . 
demonstrated to fit this paradigm (e.g., Coats and McCourt, 1976; 
Hayes, 1964; McKendry, Enderwick and Harrison, 1971; Samet, 1975). 


Parsimony. The model is parsimonious; it need only assess an 
operator's weights for a limited number of information dimensions 


or attributes. Besides significantly minimizing the model's 


computational needs and software complexity, this feature reflects 
findings of psychological experiments (e.g., Hayes, 1964; Slovic, 
1975; Wright, 1974) and is in agreement with contemporary decision 
theory (e.g., Tversky and Kahneman, 1974), all of which suggest that 
a decision maker can only perform weighting and aggregation on a 
relatively small number of the important dimensions in the decision 
task. Also, when decisions are based on a manageable number of . 
information dimensions, they are easier to communicate and rationalize 
-- especially in group dec?sion-making situations (Gardiner and 
Edwards, 1975). In complex Situations, therefore, the reduction in 
the number of model parameters in the proposed MAU model as compared 
to the expected utility model are of major importance. | 


Robustness. Like other linear composition models, the multi-attribute 
decision model is robust; that is, its performance 1s not 
Significantly degraded by small perturbations in the model's 
parameters (Dawes and Corrigan, 1974). Such robustness probably 
contributes to the finding that multi-attribute utility assessment 
techniques have proven, in certain instances, to be more reliable and 
valid than direct assessment procedures (Newman, 1975; Samet, 1976). 


speed of Adaptation. The adaptive model adjusts all parameters with 
each incorrrectly predicted operator decision (i.e., action selection). 
Thus, weights for specific troubleshooting attributes can be trained 
rapidly during sessions in which the operator performs the diagnostic 
task. This is in contrast to the current model, in which only the 
parameters of the chosen and predicted actions are adjusted in a given 
decision. 


Flexibility. The multi-attribute utility model is inherently flexible. 


If accurate prediction cf troubleshooting behavior is not sufficient 
(j.e., if attribute weiants cannot be trained to stable values), 
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additional features or attributes can be added and inappropriate ones 
deleted. The response to dynamic changes in conditions is similarly 
flexible. In instances where conditions change rapidly and radically, 
new sets of weights trained for the new conditions can be substituted. 
Such weight vectors could be prepared ahead of time by training them 
either in actual operational situations or in step-through simulations. 


UTILITY ESTIMATOR 


The dynamic utility estimation technique is based on a trainable pattern 
classifier. Figure 3-3 illustrates the mechanism. As the operator performs 
the task, the on-line utility estimator observes his choice among the available 
actions at each point in the sequence and views his decision-making as a 
process of classifying patterns consisting of varying attribute levels. The 
utility estimator attempts to classify the attribute patterns by means of a 
linear evaluation (discriminant) function. These classifications are compared 
with the operator's choices. Whenever they are incorrect, an adaptive, error- 
correction training algorithm is used to adjust the utilities. A comprehensive 
discussion of this technique can be found in Freedy, Davis, Steeb, Samet, and 
Gardiner (1976). 


TRAINING ALGORITHM 


On each trial, the model uses the previous utility weights (W.) for each 
7 (i) to compute the multi-attribute utilities (MAU, ) for éach action 
k). Thus, 


MAU, = DW, A | (3-1) 


where 


W, is the weight of the attribute, and 


A., 1s the level of the jth 


attribute associated with action k. 

The model predicts that the operator wil always prefer the action with the 
maximum MAU value. If the prediction is correct (i.e., the operator chooses the 
action with the highest MAU), no adjustments are made to the utility weights. 
However, if the operator chooses an action haveing a lower MAU value the model 
adjusts the utility weights by paring the chosen action with the predicted 
action and applying the error correction training algorithm. In this manner, 
the utility estimator "tracks" the operator's diagnostic strategy and learns 
his utilities or weights for information attributes. The training rule used to 
adjust ae weights associated with each of the attributes is illustrated in 
Figure 3-3. 


Actual in-task estimation appears feasible using pattern recognition 
techniques. Instead of batch processing, the pattern recognition methods refine 
the model decision-by-decision. Briefly, .the technique considers the decision 
maker to respond to the characteristics of the various alternatives as patterns, 
classifying them according to preference. A linear discriminant function is 
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used to predict this ordinal response behavior, and when amiss, 1S adjusted 
using error correcting procedures. This use of pattern recognition as a 
method for estimation of decision model parameters was apparently first 
suggested by Slagle, (1971). He made the key observation that the process of 
expected utility maximization involved a linear evaluation function that could 
be learned from a person's choices. 


The suggested technique was soon applied by Freedy, Weisbrod, and Weltman 
(1973) to the modeling of decision behavior in a simulated intelligence 
gathering context. Freedy and his associates assumed the decision maker to 
maximize expected utility on each decision. They assigned a distinct utility, 
U(x.,), to each possible combination of action and outcome, as shown in the 
decd Sion tree in Figure 3-4. The probabilities of occurrence of each outcome 
j given each action k were determined using Bayesian techniques. These 
patterns of probability were used as inputs to the estimation program (Figure 
3-5). The expected utility of each action A, was then calculated by forming 
the dot product of the input probability vector and the respective utility 
vector. This operation is equivalent to the expected utility calculation: 


EU(A ) 7 P(xs1) . U(x) | (3-2). 


The classification weight vector W., in the pattern recognition program 
acts as the utility U(x.,). The alternative A, having the maximum expected 
utility is selected by tKe model and compared ith the decision maker's choice. 
If a discrepancy is observed an adjustment is made, as shown in Figure 3-4. 
The adjustment moves the utility vectors of the chose - and predicted actions 
(Ww. and W_, respectively) in the direction minimizing the prediction error. 
The adjustment consists of the following: 


Woe We ed's an (3-3) 
Wr = Wo +d- P | - 
6 : A (3-4) 


where 


We is the new vector of weights (W(x, )> W(X50)] 
for action c 


We is the previous weight vector for action c 

d is the correction increment 

P. is the probability vector describing the distribution of outcomes 
[Pah. Pops cae Pat resulting from action k 
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This model is an adaptation of the R-category linear machine (Nilsson, 
1965). The pattern classifier receives patterns of descriptive data (outcome 
probabilities) and responds with a decision to classify each of the patterns 
in one of R categories (actions). The classification is made on the basis 
of R linear discriminant functions, each of which corresponds to one of the 
R categories. The discriminant functions are of the form: 


g.(x) = W & XFOK THe 26 ‘een R (3-5) 
where x is the pattern vector and W. is the weight vector. The pattern 
classifier computes the value of eath discriminant function and selects the 
category 71 such that | 


| g(x) > g(x) | (3-6) 
for all j=l, 2, ..., Ry i#3 


A geometric interpretation of the R-category linear machine is shown in 
Figure 3-6 (Nilsson, 1965). Decisions involving two possible consequences, 
X, and X»y5 are evaluated according to three discriminant functions G, (x), 
G,(X):s afd G(x). The lines of intersection between the discriminant 
hoperplanes 3re the points of indifference between actions. Mappings of these 
lines of intersection to the attribute plane are shown in the figure. The 
resulting regions R,, R,, and R, correspond to the actions maximizing the 
(expected utility) dvalfation ftinction. 


The R-category technique becomes somewhat cumbersome if a large number 
of actions are possible or if the decision circumstances change rapidly. This 
problem is a result of the assignment of a distinct, holistic utility to each 
tip of the decision tree. The number of model parameters thus increases 
rapidly with an increase in the number of actions possible. Also, the only 
weight vectors adjusted in a given decision are those corresponding to the 
model-predicted and the actually chosen actions. This partial adjustment makes 
the system somewhat unresponsive to change. 


A natural extension of Freedy's approach is to adapt the single 
discriminant, multi-attribute approach to the modeling of objective choice 
behavior. Each possible outcome of a decision can be associated with a set 
of attributes or objectives of the decision maker. An importance weight vector 
defined over the various attributes can then be adjusted to predict behavior. 
~The mechanism is simply that of a threshold logic unit. The adjustment rule 
following an incorrect prediction jis 


Wr = W + d(x, - Xn) (3-7) 
where 


W- is the updated weighting vector 
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Wis the previous weighting vector 

Xp is the attribute pattern of the model-predicted choice 
X is the attribute pattern of the decision maker's choice 
d is the adjustment factor 


A possible advantage of the pattern recognition technique over many of | 
the other forms of estimation is its flexibility of adjustment. Several types 
of error correction are possible for the TLU, each with a different combination 
of speed, stability, and complexity. The three principle forms are the fixed 
increment rule, the obsolute correction rule, and the fractional correction 
rule. These differ solely in their formulation of the adjustment factor d in 
' Equation 3-7: | 

The fixed increment rule simply assigns a non-zero constant to d. Thus 
the movement of the weight vector is a constant proportion of the difference 
in the predicted and chosen patterns. The correction may not be sufficient 
to avoid subsequent errors with the same pattern, but the process is eventually . 
convergent (Duda and Hart, 1973). The fixed increment rule has the advantages 
of simplicity and relative insensitivity to inconsistent behavior. 

A more rapid but also more potentially unstable rule is the absolute 
correction rule. This method sets d to be the smallest integer at which the 
error of the pattern is corrected. In the decision modeling situation, this 
becomes: 


d - smallest interger > _ ee (x = Xe (3-8) 


in which 
X is the attribute level vector of the operator selected choice 
x, is the attribute vector of the predicted choice 
The fractional correction rule is similar to the abosolute rule but is 


typically less extreme. The franctional rule moves the weight point some 
fraction of the above distance: | 


d= Se x) lia 


where A’ 1S a constant 0 < A <« 2. 
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All three of the adjustment rules have been proven convergent with 
linearly separable patterns (Nilsson, 1965). The speed of convergence is 
normally fastest with the absolute rule. This is illustrated for an example 
series of adjustments in Figure 3-7. The set of four numbered lines in the 
figure are a sequence of patterns. These patterns are shown as hyperplanes 
in a 2-dimensional weight space. Each hyperplane represents the difference 
between two multi-attribute vectors. The operator choice is shown by the 
direction of the arrow at each pattern. The absolute rule, (the triangles 
in the figure) is seen to achieve correct prediction after four observations, 
while the fixed rule (the circles) requires five. Unfortunately, the absolute 
rule is expected to be less forgiving of inconsistent behavior than the fixed 
or fractional rules. This is because of the large responses the absolute rule 
makes to operator inconsistencies. The fixed and fractional rules may exhibit 
a greater tendency to smooth or average the behavior. 
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SECTION IV 
THE HEURISTIC SEARCH APPROACH 
STATE SPACE MODEL 


The overall objective of knowledgeable opponent scenario generation is to 
provide a realistic simulation of an active enemy. The enemy would react to 
events and actions taken by the friendly forces and choose a course of action 
that would lead to the achievement of some enemy goal, which usually means a 
bad outcome for the friendly forces. The heuristic search approach provides 
such a mechanism. 


In the underlying model, which is called the “state space" model, the 
problem domain (such as underwater warfare) is expressed in terms of "states", 
which are complete descriptions of the tactical situations as they exist at 
some particular instant of time (Nilsson, 1971). An "action" jis a 
transformation which, when applicable, converts one state into another. Thus, 
a sequence of actions ("plan" or “allocation") converts some initial state 
into a final, or goal, state. The enemy submarine commander asks the question, 
“What sequence of actions can transform the current state into a goal state 
which satisfies my overall objectives?" In other words, "How do I get from 
where I am to where I want to go?" Before a system can perform properly, it 
must know what actions are available, under what circumstances they can be 
applied, what their effects are, and what possible states can arise from their 
use. 


BASIC SEARCH TECHNIQUES | , 


The most basic search techniques are systematic expansions of the state 
space. Starting from the start node (labeled 1 - the current state), the 
search algorithm expands all its possible successive nodes. When a goal node 
is encountered, the path from the initial node to that goal node is the 
Solution sought. In the ASW case, it is the strategy, or sequence of actions, 
the commander has to take to reach his objective. 


Figure 4-] and Figure 4-2 show the most elementary algorithms - the 
“breadth-first" and the "depth-first" algorithms, respectively. In the 
“breadth-first" algorithm, each node is expanded completely - all its "sons" 
identified - before the next one is expanded, and a given layer of nodes is 
expanded before the next is started. This method is guaranteed to find the 
Shortest path from the start to the goal nodes. The numbers in Figure 4-1 
indicate the order of node expansion. / 


In the "depth-first" algorithm, each alternative line of inquiry is sought 
to the fullest depth before other alternatives are evaluated. When such a 
search fails, the algorithm tries the next deepest possibility. Figure 4-2 
Shows the order of node expansion in this algorithm. The depth first algorithm 
does not guarantee the shortest path to a goal if more than one goal node exists. 
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FIGURE 4-2 DEPTH-FIRST EXPANSION ORDER 
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These search methods are "blind" methods because they develop 
systematically every node in the state space without using any information 
which may be known in advance about the particular problem domain or the 
particular knowledge found in the nodes that has already been expanded to 
guide the search process. The heuristic search approach is the class of 
algorithms that uses such domain specific knowledge to guide the search. 


HEURISTIC SEARCH METHODS 


Heuristic search methods try to utilize any information known about the 
problem domain to guide the search for a solution in the state space. The 
added information helps avoid the combinatorial explosion of computer resources 
(time and memory) needed for the basic search techniques. Figure 4-3 
illustrates the basic idea of the heuristic search approach by comparing it to 
depth first and breadth first searches. The contours of node expansion are 
directed toward the goals G] and G2, in contrast to the blind search algorithm. 
Applying a heuristic search usually leads to the discovery of optimal or 
suboptimal solutions in cases that would be too big to handle by standard 
techniques. Many achievements of heuristic search are known. For example, 


a. Computer Aided Design (Powers, 1973; Hagendorf et al, 1975) 


b. Test Sequence Generation for Detection of Failures in Clockmode 
Sequential Circuits (Hill and Huey), 1977 


c. Edge and Contour Detection (Martelli, 1976) 
d. Chromosome Matching (Montanari, 1970) 
e. Organic Chemical Synthesis Sy dharan. 1973) 
f. Ballistic Missile Defense (Leal, 1977) 
_g. Discovery of Mathematical Concepts (Lenat, 1978) 


The heuristic information can be contained in different parts of the search 
algorithm. If Tr is the function that generates node successors and f (n) is 
an estimate of the promise of node n to be on the path to a goal node, then the 
heuristic information may be contained in either of them. Using knowledge in 
Tr, the search algorithm would generate first the more probable successors of a 
node. On the other hand, using knowledge in f (n), the most promising nodes 
would be selected for subsequent development in the face of less promising ones. 


THE MINIMAX AND <g ALGORITHMS 


Two algorithms which have particular applicability to the case of military 
confrontation are the minimax and the <8 algorithms. The minimax is applicable 
in zero-sum adversary confrontations where what is good for one side is bad for 


NAVTRAEQUIPCEN 78-C-0107 


—— “7 BREADTH FIRST SEARCH 


WL, HEURISTIC SEARCH, 


DEPTH FURS SLARCH 


ee emecneen ei cere cae 


SEARCH TREE LIMIT 


G1, G2 Goat NopEes 


FIGURE 4-3: EXPANSION CONTOURS OF DEPTH FIRST, BREADTH FIRST, 
AND HEURISTIC SEARCH METHODS. 


40 


NAVTRAEQUIPCEN 78-C-0107 


ORIGINAL SITUATION MAXTMEZING LEVEL 
ie MOVES AVAILABLE TO SIMULATED CCMMANEER 
anisesser( ) Hew ST TUAT LOS: MIM TALZING LEVEL 
RESFONSES AVAILABLE TG ENEMY 

, eee te ea NEW SITUATIONS | MAXIMIZING LEVEL 
' | ° 
; | 
I ! 
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the other. When developing the state space of such a problem, the prudent 
decision maker has to assume. that, when given the choice, the enemy would 
select the alternative which is the most damaging to the decision maker's own 
objectives. When expanding the search space for this problem, as shown in 
Figure 4-5, the commander first determines all the alternatives available to 
him. This is the maximinzing level because at this level the commander has 
the choice, and he will obviously choose the alternative that maximizes his 
objectives. The next level is the set of responses available to the enemy . 
for each of the commander's choices. Here the enemy will make the choice, and 
he will choose the worst alternative (from the commander's point of view). 
Thus, this layer is called the minimizing level. The maximinzing and 
minimizing of layers goes on until the allocated computing resources are used 
up. At that point, the static value of each tip node (ji.e., "worth" of the 
Situation) is evaluated and the choices are made at each decision point. The 
"backed-up" values propagate upward in the state space tree until they reach 
the first layer. These values are the basis of the commander's choice among 
the alternative actions available to him. This "“minimazing" algorithm is 
repeated for every decision the simulated commander has to make; thus, it takes 
into account the dynamics of the situation, and it finds the best tactical 


_ move Foreseeing the best choice of the enemy. In this algorithm, the heuristic 


information is contained in the tip node evaluation function f (n) in the 
previous section. 


The alpha-beta algorithm is an improved version of the basic minimax 
algorithm. It uses a common sense argument. to prune the tree that has to be 
developed. It can be shown that although the algorithm allows a large part of 
the search tree to be completely ignored, it will not lose any solution that 
the basic minimax algorithm would find. 


The alpha-beta algorithm starts with a depth-first search down to some 
level n (see Figure 4-5). When the depth limit is reached, the nodes are 
evaluated and temporary values are backed-up in the tree. The alpha-beta 
technique takes advantage of these preliminary values. Consider, in Figure 
4-5, the maximizing node A in the tree after nodes 4-9 have been developed 
below it. A has been assigned a temporary value of 0.2 (propagated from 
node 5). 8B, which is a minimizing node, has been assigned a temporary value 
of 0.1 (propagated from node 9). 


At this time, there is no point developing any other successor to the node 
b (such as C) because, since it is a minimizing node, the best value B can get 
is 0.1 or lower, and node A, being a maximizing node, will always select 0.2 
over 0.1. This argument is the "alpha" half of the alpha-beta pruning. The 
dashed lines in Figure 4-5 show all the subtrees that will be pruned off and 
the order of node generation. 


The "beta" half operates in precisely the reverse for nodes in the minimum 


layers. By using the alpha-beta algorithm, the tree can be explored 
approximately twice as deep as a Simple minimax algorith, while expanding the 
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Same number of nodes. The algorithm is somewhat Slower, Inasmuch as it has | 
to do the bookkeeping for the temporary alpha and beta values. The alpha-beta 
algorithm is a very promising potential opponent model. 


ADVANTAGES 


a. Heuristic search techniques have a wide range applicability, as can be 
seen from the examples mentioned above. 


b. The underlying structure (state-space, AND/OR graphs) is very general 
and fits naturally all problems of a combinational nature and all 
hierarchical: problems which can be decomposed into goals and subgoals 
(this includes decision trees). 


c. General theoretical results are available. 


d. It is universally accepted that heuristics are crucial to cope with 
intractable problems. 


_ SCOPE AND LIMITATIONS 


a. Heuristic search techniques are designed for problems of a particular 
nature only, with well-defined states, subgoals or subproblems. 
Problems with a continuous nature, for instance planning in a 
continuum, cannot be solved via heuristic search. 


b. The use of heuristic search poses itself a problem. The more specific 
a heuristic function, the more efficient it is in guiding the search. 
How well. designed and problem-specific heuristics are will therefore 
determine their efficiency. 


c. Heuristic search might be subject to catastrophes (if no solution is 


found after the computational resources are exhausted or an 
insufficiently good solution is found). 
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SECTION V. 
PRODUCTION RULES APPROACH 
OVERVIEW 


Production rule systems represent another successful approach for 

knowledge representation and deductive mechanisms. This approach is similar 

to the heuristic search approach in that it uses a modification of the state 
Space model as the underlying conceptualization. The technique of representing 
the knowledge is different, however, and so is the mechanism which finds the 
path from the current state to the goal state. The problem specific knowledge 
(heuristics) is packaged in production-rule systems as small modular "chunks" 
called productions. 


A production is a rule which consists of a situation recognition part and 
an action part. Thus a production is a "situation - action" pair in which the 
left side is a list of things to watch for in the description of the current 


State of the world, and the right side is the list of things to do in that 
case. 


In the case of submarine warfare, a production that guides the commander's 
actions may be something like: 


If 
AND 
Enemy dominates area 
Enemy has not yet detected you 
You are out of his torpedo range 
You are in very shallow water 
Then 
Escape by sinking to bottom in silence 
| The effect of such a production is to respond to the situation when all the 
aspects combined by the AND are present and change the current action from 
whatever it was before to ESCAPE. 
In addition to the large set of such productions, the production rule 
system contains a triggering mechanism that uniformly checks all the productions 
that apply in a given situation (by testing for truth of the left hand side of 


each production) and applies those that are applicable - causing the situation 
to change. | 
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The main advantages of the production rule approach are the ease and 
modularity of the knowledge representation. Consequently, it is easy to 
elicit information from experts without requiring that they be programmers. 
The information is incremental; thus it is easily modified, updated and 
expanded into new areas of expertise. It is also usually argued by production 
rule proponents that this form of knowledge representation is highly compatible 
with human congnition, making it a very useful and powerful training tool. For 
example, suppose an opponent commander model is built as a production rule 
system. It becomes very easy to communicate with the system and ask "why have 
you done that?" meaning what aspects of the situation or what actions of the 
trainee caused some unexpected response of the enemy commander. 


The trainee can discover specifically where he went wrong, and he can start 
in the middle and try other alternatives. At the same time, this is also a 
powerful debugging tool allowing experts to tune the system by following its 
reasoning process and identifying the specific cause for a mistaken conclusion 
which led to an unreasonable response. 


THE SYSTEM 
PRODUCTIONS 


As AND/OR graphs, production systems are composed of two parts: the set 
of productions and a mechanism to find a solution in a given situation. We will 
discuss first a graphic representation of the productions themselves. A : 
Simple production specifies a single conclusion which follows from the 
Simultaneous satisfaction of the situation recognition conditions. Any 
particular conclusion may spring from any production. The conclusion specified 
in a production follows from the AND or "“conjguetton" of the facts specified in 
the premise recognition part. A conclusion reached by more than one production 
is said to be the OR or "dtsjuction" of those productions. Depicting these 
relationships graphically produces an AND/OR graph with direcved edges. Figure 
5-1 shows an AND/OR graph which reaches from base tactical facts (F.) at the 
bottom, through the different productions (P.), to a conclusion or dn act to 
be taken at the top. Any. ccllection of productions implies such a graph. In 
Figure 5-1 we used the set cf submarine warfare productions given in Figure 5-2. 
These productions should be taken as an example of the capabilities of this 
approach. : | 


| The arrangement of nodes in this graph focuses on how the conclusion can 

be reached by various combinations of basic facts. As with ordinary AND/OR 
trees, a conclusion is verified if it is possible to connect it with basic facts. 
through a set of satisfied AND/OR nodes. Different sets of facts can be used tof 
reach a given conclusion by selecting different branches at OR nodes. 
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Figure 5-2 (Continued) 
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OR 
Location near enemy shore 
3 or more enemy ships in the area 
Anti sub ship in area 
Nuclear enemy sub 
THEN 
ENEMY DOMINATES SCENE 
P2, P3 
; LF 
AND 
SELF WITHIN SENSOR RANGE 
ENEMY CHANGED COURSE 
THEN 
SELF DETECTED 
ELSE 
NOT DETECTED 


Figure 5-2 Production Rule Example 
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AND 
ESCAPE 
SELF in Islandic area 
THEN: 
Hide behind an island 
PB, PO 
iP 
OR 
AND 
One enemy sub in area 
Self in deep water 
Enemy sub of same type 
AND 
Enemy surface ship alone 
NO ASW in air 
THEN 


SELF dominate 


Figure 5-2 (Continued) 
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Attack mission 
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THEN | 
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AND 
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Enemy surface ship alone 
Self under water 
Target in torpedo range 
THEN 


Use torpedo 
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Figure 5-2 (Continued) 
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AND 
Attack 
One enemy sub 
Target in torpedo. range 
NO ASW in air 
THEN 


Use torpedo 


Figure 5-2 (Continued) 
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Sometimes it is useful to look at the implied graph to get a better feel 
for the problem space, noting whether the reasoning is likely to be broad and 
shallow, narrow and deep, or broad and deep. Again, however, caution is in 
order. When used prominently in discussions of goals and subgoals, and/or 
graph representations tend to make control look like a search problem with the 
various search ideas becoming applicable. This position has its good and bad 
features. One bad feature is that it can create a tendency to waste time with 
an existing problem space rather than to make a better space, where less 
Search, if any, would be needed. 


THE CONTROL MECHANISM. 


The control mechanism which utilizes the set of productions takes a 
collection of known facts about the situation and makes new conclusions 
according to productions that are satisfied by the initial facts. In operation, 
the user would first gather up all facts available and present them to the 
system. The control mechanism will then scan the production list for a 
production which has a matching situation part, i.e., all the premises in the 
left hand side are satisfied. This production will be activated and its action 
Side will change the facts known about the situation. In the example given, 
if Pl was activated, it adds the conclusion that the “enemy dominates the 
area’ to the situation description. 


Reasoning from base facts to a conclusion rarely entails using only a 
Single step, however. More often, intermediate facts are generated and used, 
making the reasoning process more complicated and powerful. One consequence is 
that the individual productions involved can be small, easily understood, 
easily used, and easily created. Also notice that the intermediate facts added 
by the lower level productions are tactical facts meaningful to the military 
users of the system, resulting in many benefits. Using this approach, a : 
Simulated submarine commander can produce a chain of conclusions leading to 
intelligent tactical actions, even as a trainee commander makes his actions 
dynamically. | 


In the event many productions have premise or situation specifications 
that are satisfied simultaneously, there must be some way of deciding among 
them. Here are some of the popular methods: 


a. All productions are arranged in one long list. The first matching 
production is the one used. The others are ignored. 


b. The matching production with the toughest requirements is the one used, 
where "toughest" means the longest list of constraining premise or 
Situation elements. 


c. The matching production most recently used is used again. 
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d. Some aspects of the total situation are considered more important. 
Productions matching high priority situation elements are 
privileged. | 


So far, the deduction oriented production system is assumed to work from 
known facts to new, deduced facts. Running this way, a system exhibits 
forward chaining. But backward chaining is also possible, for the production 
System user can hypothesize a conclusion or a desired final state and use 
the productions to work backward toward an enumeration of the facts that 
would support the hypothesis. For example, (see Figure 5-1) in the case of 
a submarine commander, the system can start from the mission, e.g., attack 
enemy sub. Then chaining backward from P10, it will conclude that it has to 
achieve self-dominance. This can be achieved by confronting an enemy surface 
ship (P9) or an enemy sub of the same type in deep water (P8). Thus, by a 
small change of orientation, the same set of productions was used backwards. 
Knowing that a deduction-oriented production system can run forward or backward, 
which is better? The question is decided by the purpose of the reasoning and 
by the shape of the problem space. Certainly, if the goal is to discover al] 
that can be deduced from a given set of facts, then the production system 
must run forward, On the other hand, if the purpose is to verify or deny a 
particular conclusion, or reach a desired situation throught a sequence of 
actions, then the production system is probably best run backward from that 
conclusion. Avoiding needless fact accumulation is one reason. Indeed, no 
irrelevant facts need be checked at all. The production system can run 
backward from all premise elements as long as suitable productions exist. Using 
sensory systems to supply facts is necessary only when no productions apply. 


Deciding whether forward chaining or backward chaining is better depends, 
In part, on the shape of the space. Figure 5-3 illustrates this by way of two 
symmetric situations. All possible states are represented along with the 
operations that can change one state into a neighbor. In the first situation 
shown, forward chaining is better because there is a genera’ fan in from the 
typical initial states toward the typical goal states. It is hard to get into. 
a dead end. In the second situation, the shape favors backward chaining since 
there 1s fan out. 


ADVANTAGES 


Proponents of production rule systems usually site one or more of the 
following advantages: 


a. Production systems provide a powerful model of the basic human problem 
solving mechanisms. This results in easy expert elicitation user 
communication at the comfortable level of military tactical concepts 
and terms, easy trouble-shooting, and good training capability. 


] 
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System states are meaningful to users, debuggers, etc., thus an 
evaluation can be made on the tactical level rather than in the 
computer implementation level. 


Production systems enforce a homogeneous representation of knowledge, 
effectively separating the static data representation from the 


uniformly applied evaluation mechanism. 


The control mechanism is simple and explicit on what to do next, is 
clear from the current state what productions are available. 


Production systems allow incremental growth through the addition of 
individual productions and without changes necessary to any others. 


. Production systems allow unplanned, but useful, interactions which are 


not possible with control structures in which all procedure 
interactions are determined beforehand. A piece of knowledge, or a 
combination of such, can be applied whenever appropriate, not just 
whenever a programmer predicts it can be appropriate. This can lead 
to highly intelligent performance by systems with a surprisingly smal] 
(several hundreds) set of productions. 


Providing explanation capability to the system is natural to implement. 
When some decision is made, the system can present the sequence of 
productions that led to that conclusion, thus affording its "reasoning" 
about the situation. 


The production rule approach is as Oene as any other method based 
on the state space model. 


. Productions can be quantified with probability information leading to 


applicability in decision making and risk evaluation. 


DISADVANTAGES 


Some of the advantages of the production rule approach can become 
disadvantages if care is not exercized in the design process: 


a. 


Maintaining focus of attention: It would seem that PR systems allow 
knowledge to be tossed into the system homogeneously and incrementally 
without worry about relating new knowledge quanta to old. Thus, by 
relinguishing control, such systems allow unimportant productions to 
usurp center stage from more important productions, leading the proces: 
astray. 7 


Size problems: One particular problem is that production systems may 


break down if the amount of knowledge is too large, for then the number 
of productions grows beyond reasonable bounds. The advantage of not 
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needing to worry about the interactions among the productions can 
become the disadvantage of not being able to influence the 
interactions among the larger number of productions. 


The possible solution, of course, is to partition the facts and 
the productions into sybsystems such that at any time only a 
manageable number are under consideration. Within each subsystem, 
some productions may be devoted to arranging transfer of information | 
or attention to another subsystem. Curiously, some users of Hewitt's 
ACTORS language produce programs that have a strong resemblance to 
Systems of communicating production subsystems. 


This solution, however, goes against one of the main advantages 
of production rule systems, namely, modularity and independent control. 
If control guiding productions are added, we again have the problem of 
_explicitly directing where control should go. 


Global Effects: It is awkward to represent global effects using PR 
approach. Here, again, the modularity of the productions requires that 
if some global effects (such as weather in ASW) takes part in many 
productions, it is necessary to duplicate the whole set of productions 
which behave differently for each different weather state. 
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